Knowledge Discovery in GENBANK
نویسندگان
چکیده
We describe various methods designed to discover knowledge in the GenBank nucleic acid sequence database. Using a grammatical model of gene structure, we create a parse tree of a gene using features listed in the FEATURE TABLE. The parse tree infers features that are not explicitly listed, but which follow from the listed features. This method discovers 30% more introns and 40% more exons when applied to a globin gene subset of GenBank. Parse tree construction also entails resolving ambiguity and inconsistency within a FEATURE TABLE. We transform the parse tree into an augmented FEATURE TABLE that represents inferred gene structure explicitly and unambiguously, thereby greatly improving the utility of the FEATURE TABLE to researchers. We then describe various analogical reasoning techniques designed to exploit the homologous nature of genes. We build a classification hierarchy that reflects the evolutionary relationship between genes. Descriptive grammars of gene classes are then induced from the instance grammars of genes. Case based reasoning techniques use these abstract gene class descriptions to predict the presence and location of regulatory features not listed in the FEATURE TABLE. A cross-validation test shows a success rate of 87% on a globin gene subset of GenBank.
منابع مشابه
Knowledge Discovery in Biosequences Using Sort Regular Patterns
This paper considers knowledge discovery by sort regular patterns, which are strings over sort letters representing nite sets of basic letters. We devise a learning algorithm for the class based on the minimal multiple generalization technique, and evaluate the method by experiments on biosequences from GenBank database. The experiments show that relatively a simple sort pattern can represent a...
متن کاملBook review on Bioinformation Discovery: Data to knowledge in Biology
Researchers in almost all disciplines of Biology use mathematical, statistical models and computer programs to analyze and validate biological data. Systematic data mining is frequently completed in recent years using Bioinformatics soft-wares. The available Bioinformatics techniques and tools help annotate functions for newly generated data in biological investigation. The book (ISBN 978-1-441...
متن کاملDesigning an Ontology for Knowledge Discovery in Iran’s Vaccine
Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...
متن کاملCommercializing Knowledge: University Science, Knowledge Capture, and Firm Performance in Biotechnology - Science & Cents conference 2002 - FRB Dallas
Our research program over the past 10 years has focused on the use of basic science knowledge in commercial firms and the impact of that knowledge on firm performance. In our earlier research, we have found substantial consistent evidence that top academic science, specifically the star scientists who make most of the defining discoveries, provides intellectual human capital that defines the te...
متن کاملCommercializing Knowledge: University Science, Knowledge Capture, and Firm Performance in Biotechnology - Science & Cents conference proceedings 2002 - FRB Dallas
Our research program over the past 10 years has focused on the use of basic science knowledge in commercial firms and the impact of that knowledge on firm performance. In our earlier research, we have found substantial consistent evidence that top academic science, specifically the star scientists who make most of the defining discoveries, provides intellectual human capital that defines the te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره 1 شماره
صفحات -
تاریخ انتشار 1993